# High Performance And High Speed Multiplier Using Modified Booth Algorithm With Reversible Logic Synthesis And Hybrid Carry Look Ahead Adder

VishnuPriya M<sup>1</sup>, Sujitha V<sup>1</sup>, Tharsana C<sup>1</sup>, Pushparaj B<sup>2</sup>

Abstract This paper describes the Design of Multiplier and Accumulator Unit by means of Reversible Logic synthesis and Hybrid Carry Look ahead adder. The Important measure to increase the speed of multiplier is to reduce the partial products because multiplication precedes a series of addition .For that Modified Booth Algorithm is used in CSA, it treats Positive and Negative numbers equally. Hybrid CLA is used to control overall MAC Delay. Reversible logic gates have ability to to reduce the power dissipation which is the main requirement in low power VLSI design. The advantage of using Reversible Logic Synthesis is to have reduced Complexity, area and Delay. The proposed Hybrid CLA exhibits High Performance regards power consumption, delay. Multipliers based on Wallace reduction tree provide an area-efficient strategy for high speed multiplication. The Proposed MAC unit provides efficient reduced Critical path delay, area overhead. This new high speed hybrid carry look-ahead adders are simulated by Xilinx ISE Simulator

**.Index Terms**— Multiplier and accumulator (MAC), modified booth algorithm (MBA), Hybrid carry look-ahead adder (CLA), reversible logic gate (RLG), ripple carry adder (RCA).

#### 1 INTRODUCTION

Multipliers are key components of many high performance systems such as FIR filters, microprocessors, digital signal processors, etc. With the recent advances in technology, many researchers have worked on the design of increasingly more efficient multipliers. They aim at offering higher speed and lower power consumption even while occupying reduced silicon area [2].Generally, Researches made models for achieving reduced delay. Multiplier based on Wallace Tree along with Hybrid CLA is very faster than normal multiplier. Now-a-days, different architectures are introduced to design a better Wallace multiplier architecture [1].

The basic multiplication principle is twofold i.e. evaluation of partial products and accumulation of the shifted partial products [4]. . Most of DSP applications such as fast Fourier transform (FFT) require additions and multiplications. Since the multipliers have a significant impact on the performance of the entire system, many high-performance algorithms and architectures have been proposed to accelerate multiplication [5].

<sup>1</sup> UG Students, Department of Electronics and Communication Engineering, Renganayagi Varatharaj College Of Engineering, Sivakasi, India.

<sup>2</sup>Assistant Professor, Department of Electronics and Communication Engineering, Renganayagi Varatharaj College Of Engineering, Sivakasi, India.

A Multiplier can be divided into further three stages: The Initial stage is Tree for reducing the partial products.



Figure 1:: Block diagram of Wallace tree

Here Partial products generation (PPG) stage is the second stage in which the multiplicand and the multiplier are multiplied bit by bit to generate the partial products. Partial products addition stage or reduction of partial products (PPR) is very important part in which it is the most complicated and that determines the speed. Finally, Adder is used as third stage. We have proposed the design of adder as different from others.

In Recent Development Digital Electronics, Reversible Logic Synthesis play a vital role. Reversible logic gates have ability to reduce the power dissipation which is the main requirement in low power VLSI design. Because, energy will not be dissipated from a system as long as the system allows the reproduction of the inputs from observed outputs. Reversible computations can generate inputs from outputs and can stop and go back to any point in the computation history. It also Reduces Quantum Cost, number of garbage values.

The hardware requirements is done in terms of full adder (FA) and the length of final adder (FAL) for different size of Wallace multiplier is obtained [2].Here the Proposed Model describes the Parallel MAC unit with Modified Booth Algorithm by Reversible Logic synthesis and Hybrid CLA is used instead of conventional CLA. Section 2discuss the Modified Booth Algorithm. Section 3 discusses the Overview of Conventional CLA. Section 4 discusses the Proposed Hybrid CLA. Section 5 discuss the Overview of Reversible Logic Synthesis. Section 6 discuss the Parallel MAC and Architecture of Wallace based CSA. Section7 discuss the Conclusion.

# 2 ARCHITECTURE OF MODIFIED BOOTH ALGORITHM

In this Novel method, Modified Booth Algorithm is used for achieving High Speed than conventional booth Algorithm.



Fig. 2: Modified Booth multiplier with 3-stage pipelines

MBA reduces the Partial Product to half number than conventional booth. Figure 2 shows the Modified Booth algorithm with pipelined stages. In N-bit modified booth algorithm, the number of partial products is N/2.By that, there will be four partial products in 8-bit modified booth multiplier. The usage of Wallace Tree is to add three rows and generate two rows. Carry save adders provides good performance in Wallace tree. Mostly, Ripple Carry adder has large critical path delay. Realization and reduction in the number of partial product addition stages are using multi-bit compressors. Here 4-2 Compressor is used .

IJSER © 2017 http://www.ijser.org



Figure 3: 4:2 Compressor with universal gates

In this Compressor, the efficient outputs are generated at each stage, they are available to the multipliers much head of the inputs Critical path delay is minimized by replacing the XOR gate by multiplexer Blocks.

### **3 OVERVIEW OF CONVENTIONAL CLA**

Extensive circuit delay is produced in RCA because of many gates being used in the carry path from LSB to MSB [3].In case of Parallel Adder, the speed with which an addition can be performed is governed by the time required for the carriers to propagate or ripple through all stages of the adder. The Carry look ahead adder speeds up the process by eliminating this ripple carry delay. Generally. Delay of the addition process depends mainly on the size of the operands. The Method of speeding up the addition process is based on two additional functions of full adder called the *Carry Generate* and *Carry Propogate*. Here generating (G) and propagating (P) concepts are used to generate carries.



N+1 carry signals C[n-1:0] are generated by

P[n-1:0],G[n-1:0] and carry-in-signal according to equations (1),(2),(3),the carry depends upon the number of input bits.

(3)

$$C_{1} = C_{0}P_{0} + G_{0}$$
(1)  

$$C_{2} = C0P0 P1 + P1G0 + G1(2)$$
  

$$C_{3} = C0P0 P1P2 + P1 P2 G0 + G1 P2 + G2$$
  

$$C_{n-1} = C_{in} \cdot \prod_{i=0}^{n-2} P_{i} + \sum_{i=0}^{n-2} G_{i} \cdot \prod_{i=i+1}^{n-2} P_{j}$$

We can write the carry at bit i as,

IJSER © 2017 http://www.ijser.org International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February-2017 ISSN 2229-5518

$$C_i = C_{in} \prod_{j=0}^{i-1} P_i + \sum_{j=0}^{l-1} G_i \cdot \prod_{k=j+1}^{i-1} P_k , \quad 0 \le i \le n$$

These are the steps and equations taken for addition process using Conventional Carry Look-ahead Adder.



Fig. 5. 4-bit Conventional CLA

#### **4 PROPOSED HYBRID CLA**

The Circuit for Carry look ahead adder introduces a delay corresponding to two gate levels. In order to reduce the delay of Conventional Adder, Hybrid carry Look ahead adder is proposed.



Fig. 6: Hybrid CLA for 16-bit

Here three stages are considered as like hierarchical carry look ahead adder. Figure 5 shows the design of Hybrid CLA.

In the first stage, propagate and generate values are produced by giving two n-bit number as inputs. After that, Intermediate carry and overall carry are generated. Now, top module should hold the carry because it occupy more space. The merit of it is no need of generation of carry propagate(Pout) and carry generate(Gout). The sum of every bit is generated simultaneously by generating the carry of whole design at bottom stage. By doing so, the area and delay is reduced. Also High Speed can be achieved.

P: an indicator of propagated carry to the component. In other words, the carry is propagated through the entire component if and only if it was propagated through every one of the components from the previous stage. The equation (4) is for the 4-bit module.

$$\mathsf{P} = \prod_{i=0}^{n-1} P_i$$

IJSER © 2017 http://www.ijser.org P = P0 P1P2P3(4)

G: an indicator of generated carry to the component. In other words, the carry is generated through the entire component if and only if it was generated through every one of the components from the previous stage. The equation (5) is for the 4-bit module.

$$G = \sum_{i=0}^{n-1} G_i \cdot \prod_{j=i+1}^{n-1} P_j$$

 $G= P_1P_2P_3G_0 + G_1P_2P_3 + G_2P_3 + G_3(5)$ 



Fig.7: Constructing 2-input XOR gate using 2input Bubbled NOR gates

The overall structure is designed by using Bubbled NOR gate. According to the analysis, Bubbled NOR gate is better than all other gates because lesser area and less delay can be achieved and moreover less power is consumed over NAND gate. We use Four 2-input Bubbled NOR gate instead of 2input XOR gate to get the same result. Figure 7 shows the design using four 2-input Bubbled NOR gate instead of 2-input XOR gate

## **5 OVERVIEW OF REVERSIBLE LOGIC SYNTHESIS**

Energy loss is a necessary consideration in digital circuit design, also known as circuit synthesis .Power Consumption is very much reduced.

A reversible logic circuit should have the following features:

- $\cdot$  Use minimum number of reversible gates.
- $\cdot$  Use minimum number of garbage outputs.
- $\cdot$  Use minimum constant inputs.

Assuming that every transistor out of more than  $4 \times 107$  for Pentium-4 technology dissipates heat at a rate of the processor frequency, for instance 2 GHz, the figure becomes  $4 \times 1019 \times kT \ln 2$  J/sec. The processor's working temperature is greater than 400 degrees Kelvin, which brings us to  $24 \times 1021 kT \ln 2$  [6].

The multiple output Boolean function F(x1; x2; ...; xn) of *n* Boolean variables is called reversible if the number of outputs is equal to the number of inputs; Any output pattern has a unique reimage. In other words, reversible functions are those that perform permutations of the set of input vectors. Every gate output that is not used as input to other gate or as a primary output is called garbage. **Garbage** is the number of outputs added to make an *n*-input *k*-output function ((*n*; *k*) function) reversible.

*Input* + *constant input* = *output* + *garbage*.

The Quantum Cost of 1\*1 Reversible gates is zero, and Quantum Cost of 2\*2 Reversible gates is one [6]. Reversible logic is used in *quantum computation, nanotechnology and other low power digital circuits.* 

One bit of information loss, dissipates

k T ln 2 Joules of energy

International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February-2017 ISSN 2229-5518

Where, k is Boltzmann's constant and T is absolute temperature. For room temperature, energy loss is small for one bit,  $2.9 \times 10^{-21}$  J

Physically Reversible and Logically Reversible are the two types..There are many types of reversible logic gate, each having different logic function: Feynman Gate, Double Feynman Gate (F2G), Toffoli Gate, Fredkin Gate, Peres Gate etc. We use three gates Feynman, Toffoli and New gate for the designing of Full adder.

#### 5.1: Feynman / CNOT Gate

Controlled NOT (CNOT) gate is an example for a  $2^2$  gate. The Reversible  $2^2$  gate with Quantum Cost of one having mapping input (A, B) to output (P = A, Q = A B) is as shown in the Fig 8.



#### Fig. 8: Block diagram

#### 5.2:Toffoli Gate:

The 3\*3 Reversible gate with three inputs and three outputs. The inputs (A, B, C) mapped to the outputs (P=A, Q=B, R=A.B C) is as shown in the Figure 9.



Fig. 10: Block diagram

Toffoli gate is one of the most popular Reversible gates and has Quantum Cost of 5.

#### 5.3:New Gate:

The New gate has three inputs, two outputs and one garbage output. The inputs named as input vector  $I_v$  (A,B,C) and output is named as output vector  $O_v$  (A, AB $\oplus$ C, A'C' $\oplus$ B'). It is also called 3\*3 New gate. The block diagram for new gate is shown in the



figure 11.

#### Fig.10: Block diagram

#### 6. Proposed Parallel MAC Unit

In Previous novel design, Multiplier using improvised Booth Algorithm by Reversible Gate Logic and also substituting the conventional CLA with hybrid CLA. The main difference is in designing the overall structure using NAND gate. Instead of 2input XOR gate, we can use four 2-input NAND gate to get less power. The architecture from the conventional method used normal full adder and had maximum delay. But in this proposed work, they achieved significant improvement in delay with little increase in power. All logic modules are implemented in terms of Bubbled NOR gates to design a high speed hybrid carry look-ahead adder.

Bubbled NOR gate has better area and delay over XOR gate for static CMOS technology. A Full adder is designed in MAC operation by using reversible logic gate. Here the Proposed MAC architecture is done by using hybrid CLA instead of conventional CLA and also using Reversible logic full adder instead of conventional full adder.



Fig 11: Parallel MAC unit based on MBA [7]

Novel area efficient and high speed MAC architecture is proposed in which an improvement is over the current conventional Architecture. By using hybrid reduction through HA, FA and CLA, the performance is improved. Through this more efficiency and speed is attained. The normal full adder in the CSA is replaced by a new reversible logic gate In addition,. The circuit is simulated using Verilog HDL and synthesized in Xilinx ISE Simulator. We anticipate the proposed MAC can be used in high speed DSP application. In Fig.11 the proposed structural design has been portrayed and it contains a Booth encoder, CSA and the final addition stage. The tree with the reversible logic gates that had replaced the normal FAs which improve the delay of the adder. Conventional CLA is replaced by Hybrid CLA to add the sum [S] and carry [C]. The structural design of the hybrid CSA has been shown in Fig.9,

which performs 8X8-bit operation. It was developed based on the existing architecture. In Fig.10, Sum[i] and Carry[i] are corresponding to the i<sup>th</sup> bit of the feedback sum and carry. Z[i] is the i<sup>th</sup> bit of the sum of the lower bits for each partial product that were added in advance. Since the multiplier is for 8 bits, totally four partial products (Q<sub>0</sub>[7:0]-Q<sub>3</sub>[7:0]) are generated from the Booth encoder. The Carry Save Adders a type of Digital adder, used in Computer Micro architecture to compute the sum of three or more n-bit numbers in binary. Minimum four rows of reversible full adder is needed for four partial product.



Fig.12: Architecture of CSA

The grey square in Fig. 12 represents a HA and the white square represents a reversible logic FA. The rectangular symbol with five inputs shows a 2-bit CLA with a carry input. The critical path of this CSA is decided by the 2-bit CLA. It is also possible to use FAs to implement the CSA without using CLA. Instead of this conventional CLA we use a hybrid CLA to improve the performance. However, the number of bits in the final adder will be increased even if the lower bits of the earlier partial products are not processed in advance by the CLA. So when the MAC is considered, its performance is reduced. The proposed CSA structural design have been compared and briefed with the other existing architectures in Table I. The biggest differences be-

tween proposed and the others is that we used reversible full adder and hybrid carry look-ahead adder in place normal full adder and conventional CLA.

# 7 IMPLEMENTATION

Here we have implemented and analyzed the Proposed MAC. The Results are compared with conventional design. We made new Hybrid CLA and Full Adder with Reversible logic gates. By doing like this, we can reduce the number of logic gates and garbage outputs. Moreover the Quantum cost also reduced with no Fan-out.

Table II defines the Delay concern in the corresponding bits taken.

TABLE I

| Refer-<br>ences       | [9]                                         | [5]                                  |           | Proposed me-<br>thod                                  |
|-----------------------|---------------------------------------------|--------------------------------------|-----------|-------------------------------------------------------|
| Accumu-<br>lat<br>ion | Result<br>data of<br>final<br>addi-<br>tion | data a<br>of o<br>final <sup>1</sup> |           | Sum and car-<br>ry<br>of<br>CSA                       |
| CSA tree              | FA                                          | 2-bits<br>CLA                        | НА,<br>2- | Reversible<br>logic FA,<br>HA and 2-bit<br>hybrid CLA |

METHODOLGY

TABLE II

Delay time analysis

| Addition   | Conventional<br>CLA |         | Proposed me-<br>thod(HYBRID<br>CLA) |          |    | TABLE II<br>TABLE III<br>Area and Power Report |      |                       |           |  |
|------------|---------------------|---------|-------------------------------------|----------|----|------------------------------------------------|------|-----------------------|-----------|--|
| 4-BIT      | 8.92ns              |         | 8.29ns                              |          |    |                                                |      |                       |           |  |
|            |                     |         |                                     |          | -[ | 4-bit Addi                                     | tion | Area(µm <sup>2)</sup> | Power(µW) |  |
| & RIT      | 12 16n              | 12.16ng |                                     | 19 81 no |    |                                                |      |                       |           |  |
| Conventior | nal                 | 161.88  |                                     | 42.85    |    |                                                |      |                       |           |  |
| CLA        |                     |         |                                     |          |    |                                                |      |                       |           |  |
| Hybrid CLA |                     | 200.01  |                                     | 48.28    |    |                                                |      |                       |           |  |

By using Conventional CLA with Normal Full adder has maximum Delay. In this, We achieve Minimum Delay with little increase in power.



#### Comparison table for MAC

| Parameter            | MBA[7]             | Proposed MAC    |
|----------------------|--------------------|-----------------|
| No of Slices<br>used | 178 out of<br>4656 | 165 out of 4656 |
| No of LUT's          | 355 out of<br>9312 | 302 out of 9312 |
| Total Delay          | 34.35ns            | 25.45           |

#### Figure 13:Simulation Output:

| D B V A G        | ∑x§ ¤¤ ∦,¥]    | I V GG    |             | FFDF         | 8 C 3      | 10.31        | 1 1× 1008    | 1            | 1 000.0 49 pc |
|------------------|----------------|-----------|-------------|--------------|------------|--------------|--------------|--------------|---------------|
| Name             | Take           | 1.00.0428 | 1,000,043 # | 1.000,044 28 |            | 1.000,046,98 | 1.001.147.08 | 1,000,049 24 | 1,000,049 (20 |
| ► 📲 474<br>₩ 925 | 11110018       |           |             |              | 1110       |              |              |              |               |
| 10.00            |                |           |             |              | 1.0012     |              |              |              |               |
|                  |                |           |             |              |            |              |              |              |               |
| M asa            | 00000001111100 |           |             |              | 50000000 E | 111150       |              |              |               |
| 10.00            | 1              |           |             |              |            |              |              |              |               |
| ► 10 100         | 4414<br>1315   |           |             |              | 111        |              |              |              |               |
| ► 🦞 40-5         |                |           |             |              | 111        |              |              |              |               |
| 100 64           |                | _         |             |              |            |              |              |              |               |
| lig tus          | 1              |           |             |              |            |              |              |              | ł             |
| letd<br>leta     | 1              | _         |             |              |            |              |              |              |               |
| lig to           | 4              |           |             |              |            | _            |              |              |               |

#### **8 CONCLUSION**

This Work presents a New MAC structure, employed with Modified Booth Algorithm which is very faster than conventional algorithm. The Conventional CLA is replaced by Hybrid CLA. This MAC is used to eradicate the excess accumulation process that has greatest delay. This proposed CLA has low combinational delay path when compared with conventional CLA. Also the Full Adder is designed with reversible logic gates which has less garbage values and fan outs. Hence low power and Loss of information is eliminated. This proposed MAC with high speed is used in General purpose Data path structure and Digital Signal Processors. As Future work, this Hybrid CLA along with full adder of reversible logic is used in Low Power Digital Applications, Nanotechnology, MAC Unit ,Filter Convolution, Cryptography and Computer Graphics etc.

#### REFERENCES

[1] Jagadeshwar Rao M<sup>1</sup>, Sanjay Dubey<sup>2,"</sup>A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits", *IOSR Journal of Electronics and Communication Engineering (IOSRJECE) ISSN:* 2278-2834,*ISBN No:* 2278-8735 *Volume 3, Issue 1 (Sep-Oct 2012), PP 07-11* 

[2] Chinababu Vanama1, M.Sumalatha "Implementation of High Speed Modified Booth Multiplier and Accumulator (Mac) Unit" IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-ISSN: 2278-2834,p- ISSN: 2278-8735.Volume 8, Issue 5 (Nov. - Dec. 2013), PP 17

[3] Raghava Garipelly, P.Madhu Kiran, A.Santhosh Kumar, "A Review on Reversible Logic Gates and their Implementation," *International Journal of Emerging Technology and Advanced Engineering (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 3, March 2013).* 

[4] Yeh, Wen-Chang, and Chein-Wei Jen. "Highspeed Booth encoded parallel multiplier design." *Computers, IEEE Transactions* on 49.7 (2000): 692-701.

[5]Elguibaly, Fayez." A fast parallel multiplier accumulator using the modified Booth algorithm." *Circuits and Systems II: Analog and Digital Signal Processing*, IEEE Transactions on 47.9 (2000): 902- 908.

[6] Kim, Soojin, and Kyeongsoon Cho. "Design of high-speed modified booth multipliers operating at GHz ranges." *World Academy of Science, Engineering and Technology* 61 (2010): 1-4.

7] Seo, Young-Ho and Dong-Wook Kim," Anew VLSI Architecture of Parallel Multiplier-Accumulator based on Radix-2 Modified Booth Algorithm" Very Large Scale Integration(VLSI) System, IEEE Transactions on 18.2(2010):201-208[2].[8] A. R. Omondi, *Computer Arithmetic Systems*. Englewood Cliffs, NJ:Prentice-Hall, 1994.

[9] Fatemeh Karami.H and Ali K. Horestani, "New

Structure for Adder with Improved Speed, Area and

Power," 2nd IEEE International Conference on Networked Embedded Systems for Enterprise Applications Perth, Australia, December 2011

# IJSER